18 research outputs found

    New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

    Get PDF
    The precise conversion of arbitrary text into its  corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words  to phonemes, while  the second-stage  model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset

    Construction of a corpus of elderly Japanese speech for analysis and recognition

    Get PDF
    Tokushima UniversityAichi Prefectural UniversityUniversity of YamanashiLREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-09We have constructed a new speech data corpus using the utterances of 100 elderly Japanese people, in order to improve the accuracy of automatic recognition of the speech of older people. Humanoid robots are being developed for use in elder care nursing facilities because interaction with such robots is expected to help clients maintain their cognitive abilities, as well as provide them with companionship. In order for these robots to interact with the elderly through spoken dialogue, a high performance speech recognition system for the speech of elderly people is needed. To develop such a system, we recorded speech uttered by 100 elderly Japanese who had an average age of 77.2, most of them living in nursing homes. Another corpus of elderly Japanese speech called S-JNAS (Seniors-Japanese Newspaper Article Sentences) has been developed previously, but the average age of the participants was 67.6. Since the target age for nursing home care is around 75, much higher than that of most of the S-JNAS samples, we felt a more representative corpus was needed. In this study we compare the performance of our new corpus with both the Japanese read speech corpus JNAS (Japanese Newspaper Article Speech), which consists of adult speech, and with the S-JNAS, the senior version of JNAS, by conducting speech recognition experiments. Data from the JNAS, S-JNAS and CSJ (Corpus of Spontaneous Japanese) was used as training data for the acoustic models, respectively. We then used our new corpus to adapt the acoustic models to elderly speech, but we were unable to achieve sufficient performance when attempting to recognize elderly speech. Based on our experimental results, we believe that development of a corpus of spontaneous elderly speech and/or special acoustic adaptation methods will likely be necessary to improve the recognition performance of dialog systems for the elderly

    Developing a Method of Recommending E-Learning Courses Based on Students’ Learning Preferences

    Get PDF
    In designing e-learning, it is desirable that individual learner’s learning style is considered. This study proposes a way to present the information about the expected adaptability of the course, in which a student wishes to enroll, based on the student’s responses to the learning preference questionnaire administered at the beginning of the course. As the result of applying the real data to the model derived, it was confirmed that it would be possible to estimate the course adaptability before taking the course and to provide the information for the student to improve his/her course adaptability based on the student’s responses to the learning preference questionnaire.15th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2011), September 12-14, 2011, Kaiserslautern, German

    The Proposal of the System That Recommends e-Learning Courses Matching the Learning Styles of the Learners

    Get PDF
    In providing e-learning, it is desirable to build an environment that is suitable to the student’s learning style. In this study, using the questionnaire to measure the student’s preferences for asynchronous learning and the use of ICT in learning that has been develoed by authors, the relationship between the learning preferences of a student that have been measured before and after the course and his or her adaptability to the course is explored. The result of multiple regression analyses, excluding the changes in learning preferences that may occur duirng the course, shows that a student’s learning adaptability can be estimated to some extent based on his/her learning preference measured before the course starts. Based on this result, we propose a system to recommend e-learning courses that are suitable to a student before the student takes the courses

    A new speech corpus of super-elderly Japanese for acoustic modeling

    Get PDF
    The development of accessible speech recognition technology will allow the elderly to more easily access electronically stored information. However, the necessary level of recognition accuracy for elderly speech has not yet been achieved using conventional speech recognition systems, due to the unique features of the speech of elderly people. To address this problem, we have created a new speech corpus named EARS (Elderly Adults Read Speech), consisting of the recorded read speech of 123 super-elderly Japanese people (average age: 83.1), as a resource for training automated speech recognition models for the elderly. In this study, we investigated the acoustic features of super-elderly Japanese speech using our new speech corpus. In comparison to the speech of less elderly Japanese speakers, we observed a slower speech rate and extended vowel duration for both genders, a slight increase in fundamental frequency for males, and a slight decrease in fundamental frequency for females. To demonstrate the efficacy of our corpus, we also conducted speech recognition experiments using two different acoustic models (DNN-HMM and transformer-based), trained with a combination of data from our corpus and speech data from three conventional Japanese speech corpora. When using the DNN-HMM trained with EARS and speech data from existing corpora, the character error rate (CER) was reduced by 7.8% (to just over 9%), compared to a CER of 16.9% when using only the baseline training corpora. We also investigated the effect of training the models with various amounts of EARS data, using a simple data expansion method. The acoustic models were also trained for various numbers of epochs without any modifications. When using the Transformer-based end-to-end speech recognizer, the character error rate was reduced by 3.0% (to 11.4%) by using a doubled EARS corpus with the baseline data for training, compared to a CER of 13.4% when only data from the baseline training corpora were used

    Development and Evaluation of Gaze Tracking Integrated E-Learning Contents

    No full text
    No learning could take place without input attention. In order to promote studentsâ?? attention in e-learning content, this study focuses on gaze tracking data as a kind of enhanced input. Gaze tracking integrated video and slide synchronized e-learning contents were developed and evaluated in the study. An eye mark is showed at e-learning contents to indicate where the teacher is explaining. A comparison survey is conducted. As a result, studentsâ?? learning attention is promoted with gaze tracking data in e-learning

    Development and Evaluation of Gaze Tracking Integrated E-Learning Contents

    No full text
    No learning could take place without input attention. In order to promote students’ attention in e-learning content, this study focuses on gaze tracking data as a kind of enhanced input. Gaze tracking integrated video and slide synchronized e-learning contents were developed and evaluated in the study. An eye mark is showed at e-learning contents to indicate where the teacher is explaining. A comparison survey is conducted. As a result, students’ learning attention is promoted with gaze tracking data in e-learning

    Construction of a corpus of elderly Japanese speech for analysis and recognition

    No full text
    corecore